week 6: integers and other monsters

counts

today

  • when your outcome is a count
  • when your outcome is a category

counts

It’s less common in Psychology to use count variables as outcomes, but they’re extremely useful. They have an interpretable metric, and they’re usually in units that are meaningful (and possibly important).

This family of distributions with maximum entropy that matches the expectations of a count variable – non-negative, integers, with no maximum (different from binomial) – is the poisson family. These distributions are defined by a single parameter \((\lambda)\).

As a reminder, the mean and the variance of the Poisson distribution is equal to \(\lambda\); use that knowledge to interpret your estimates and set your priors.

The conventional link function for the poisson is the log link, which ensures that \(\lambda\) is always positive.

visualizing priors

Our brms code will use a log-link function to estimate \(\lambda\) from a linear model:

log(lambda) = a + b*var

Your priors for a and b will likely follow a typical Gaussian distribution.

a ~ Normal(0,1)

But the transformation is an exponential one, meaning that the relationship between your linear coefficients and resulting \(\lambda\) are difficult to predict.

Code
tibble(x       = c(3, 22),
       y       = c(0.055, 0.04),
       meanlog = c(0, 3),
       sdlog   = c(10, 0.5)) %>% 
  expand_grid(number = seq(from = 0, to = 100, length.out = 200)) %>% 
  mutate(density = dlnorm(number, meanlog, sdlog),
         group   = str_c("alpha%~%Normal(", meanlog, ", ", sdlog, ")")) %>% 
  
  ggplot(aes(fill = group, color = group)) +
  geom_area(aes(x = number, y = density),
            alpha = 3/4, linewidth = 0, position = "identity") +
  geom_text(data = . %>% group_by(group) %>% slice(1),
            aes(x = x, y = y, label = group),
            family = "Times", parse = T,  hjust = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("counts") +
  theme(legend.position = "none")

Here’s some code to play with to help you visualize the effects of your prior.

# ---- change these ----
mean_of_prior = 1
sd_of_prior   = .5

# --- leave this alone ---

  # plot
  data.frame(x =seq(from = 0, to = 100, length.out = 200)) %>% 
    mutate(dx = dlnorm(x, mean_of_prior, sd_of_prior)) %>% 
    ggplot(aes( x=x, y=dx)) +
    geom_area(alpha = 3/4, linewidth = 0, position = "identity")
  # expected value of lambda
  rlnorm(1e5, mean_of_prior, sd_of_prior) %>% mean
[1] 3.077702

populations and tools

data(Kline, package = "rethinking")
Kline <- Kline
Kline
      culture population contact total_tools mean_TU
1    Malekula       1100     low          13     3.2
2     Tikopia       1500     low          22     4.7
3  Santa Cruz       3600     low          24     4.0
4         Yap       4791    high          43     5.0
5    Lau Fiji       7400    high          33     5.0
6   Trobriand       8000    high          19     4.0
7       Chuuk       9200    high          40     3.8
8       Manus      13000     low          28     6.6
9       Tonga      17500    high          55     5.4
10     Hawaii     275000     low          71     6.6

We’ll model the idea that:

  1. The number of tools increases with the log population size. Why log? Because that’s what the theory says: that it is the order of magnitude of the population that matters, not the absolute size of it.

  2. The number of tools increases with the contact rate among islands. No nation is an island, even when it is an island. Islands that are better networked may acquire or sustain more tool types.

  3. The impact of population on tool counts is moderated by high contact. This is to say that the association between total_tools and log population depends upon contact.

Intercept only model

m1 <- brm(
  data = Kline, 
  family = poisson,
  total_tools ~ 1, 
  prior = c( prior( normal(3, 0.5), class = Intercept) ),
  iter = 2000, warmup = 1000, chains = 4, cores = 4, 
  seed = 11,
  file = here("files/models/61.1")
)

Interaction model

Kline = Kline %>% 
  mutate(
    P = log(population)
  )
m2 <- brm(
  data = Kline, 
  family = poisson,
  bf(total_tools ~ a + b*P, 
     a + b ~ 0 + contact,
     nl = TRUE), 
  prior = c( prior( normal(3, 0.5), nlpar = a),
             prior( normal(0, 0.2), nlpar = b)),
  iter = 2000, warmup = 1000, chains = 4, cores = 4, 
  seed = 11,
  file = here("files/models/61.2")
)
m1
 Family: poisson 
  Links: mu = log 
Formula: total_tools ~ 1 
   Data: Kline (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     3.54      0.05     3.44     3.64 1.00     1716     1970

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
m2
 Family: poisson 
  Links: mu = log 
Formula: total_tools ~ a + b * P 
         a ~ 0 + contact
         b ~ 0 + contact
   Data: Kline (Number of observations: 10) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a_contacthigh     2.84      0.47     1.92     3.74 1.00     1378     1405
a_contactlow      1.69      0.29     1.13     2.25 1.00     1439     1323
b_contacthigh     0.09      0.05    -0.01     0.19 1.00     1399     1489
b_contactlow      0.19      0.03     0.14     0.25 1.00     1444     1295

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

What do these mean?

Once we’ve moved outside of the Gaussian distribution, your best bet is to push everything back through the posterior. Do NOT try and evaluate the model estimates.

nd <- data.frame(P = 1) # intercept only model

epred_draws(object = m1, newdata = nd) %>% 
  median_qi
# A tibble: 1 × 8
      P  .row .epred .lower .upper .width .point .interval
  <dbl> <int>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1     1     1   34.6   31.1   38.2   0.95 median qi       
nd <-
  distinct(Kline, contact) %>% 
  expand_grid(P = seq(from = min(Kline$P), 
                            to=max(Kline$P), 
                            length.out = 100))

f2 <- epred_draws(object = m2, newdata = nd) %>% 
  group_by(contact, P) %>% 
  median_qi

f2
# A tibble: 200 × 8
   contact     P .epred .lower .upper .width .point .interval
   <fct>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
 1 high     7.00   31.7   24.9   40.2   0.95 median qi       
 2 high     7.06   31.9   25.1   40.3   0.95 median qi       
 3 high     7.11   32.0   25.4   40.3   0.95 median qi       
 4 high     7.17   32.2   25.6   40.3   0.95 median qi       
 5 high     7.23   32.4   25.8   40.3   0.95 median qi       
 6 high     7.28   32.5   26.1   40.4   0.95 median qi       
 7 high     7.34   32.7   26.3   40.4   0.95 median qi       
 8 high     7.39   32.8   26.6   40.4   0.95 median qi       
 9 high     7.45   33.0   26.9   40.5   0.95 median qi       
10 high     7.51   33.2   27.1   40.5   0.95 median qi       
# ℹ 190 more rows
f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = Kline, aes(y = total_tools, color = contact)) +
  labs(x = "log population", y = "number of tools") 

compare

m1  <- add_criterion(m1, "loo")
m2 <- add_criterion(m2, "loo")

loo_compare(m1, m2, criterion = "loo") %>% print(simplify = F)
   elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
m2   0.0       0.0   -45.7      7.2         8.2   3.4     91.3  14.3   
m1 -24.9      14.0   -70.6     16.9         7.9   3.5    141.2  33.7   
loo(m2) %>% loo::pareto_k_table()
Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     9     90.0%   232     
   (0.7, 1]   (bad)      0      0.0%   <NA>    
   (1, Inf)   (very bad) 1     10.0%   <NA>    
m2k = m2$criteria$loo$pointwise %>% 
  as.data.frame() %>% 
  mutate(culture = Kline$culture) %>% 
  arrange(desc(influence_pareto_k)) %>% 
  mutate_if(is.double, round, digits = 2)
m2k
   elpd_loo mcse_elpd_loo p_loo looic influence_pareto_k    culture
1     -7.12          0.49  3.27 14.24               1.27     Hawaii
2     -6.44          0.06  1.60 12.87               0.56      Tonga
3     -9.23          0.05  1.82 18.45               0.43  Trobriand
4     -3.28          0.01  0.20  6.56               0.29      Manus
5     -4.51          0.04  0.77  9.02               0.29   Malekula
6     -3.71          0.02  0.30  7.42               0.26        Yap
7     -3.12          0.01  0.12  6.23               0.23   Lau Fiji
8     -2.74          0.01  0.06  5.47               0.17 Santa Cruz
9     -2.93          0.01  0.04  5.86               0.16      Chuuk
10    -2.59          0.00  0.03  5.19               0.13    Tikopia

Adding these to our plot:

Code
m2k = m2k %>% full_join(Kline)
library(ggrepel)
f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = m2k, 
             aes(y = total_tools, 
                 size = influence_pareto_k,
                 color = contact)) +
  geom_text_repel(data = m2k, 
             aes(y = total_tools,
                 label = culture)) +
  guides(size = "none") +
  labs(x = "log population", y = "number of tools") +
  theme(legend.position = "top")

Natural scale

Code
m2k = m2k %>% full_join(Kline)
f2 %>% 
  ggplot(aes(x = P)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper, fill = contact), 
              alpha = .3) +
  geom_smooth(aes(y = .epred, color = contact)) +
  geom_point(data = m2k, 
             aes(y = total_tools, 
                 size = influence_pareto_k,
                 color = contact)) +
  geom_text_repel(data = m2k, 
             aes(y = total_tools,
                 label = culture)) +
  scale_x_continuous(trans = "exp",
                     breaks = log(c(0, 50000, 150000, 250000)),
                     labels = c(0, 50000, 150000, 250000)) +
  guides(size = "none") +
  labs(x = "population", y = "number of tools") +
  theme(legend.position = "top")

multinomial models

The binomial distribution is appropriate when you only have two outcomes, but what if you have multiple unordered outcomes? In that case, you should use the multinomial distribution, of which the binomial is just a special case.If there are \(K\) types of events with probabilities \(p_1,...,p_K\), then the probability of observing \(y_1,...,y_K\) events of each type out of \(n\) total trials is:

\[ \text{Pr}(y_i,...,y_K| n, p_i,...,p_K) = \frac{n!}{\prod_i y_i!}\prod_{i=1}^Kp_i^{y_i} \]

The link function used with the multinomial is the multinomial logit, which is also called the softmax function:

\[ \text{Pr}(k|s_i,...,s_K) = \frac{\text{exp}(s_k)}{\sum_{i=1}^K\text{exp}(s_i)} \]

The biggest issue is what to do with the multiple linear models. In a binomial GLM, you can pick either of the two possible events and build a single linear model for its log odds. The other event is handled automatically. But in a multinomial GLM, you need \(K − 1\) linear models for K types of events. One of the outcome values is chosen as a “pivot” and the others are modeled relative to it. In each of the \(K − 1\) linear models, you can use any predictors and parameters you like—they don’t have to be the same, and there are often good reasons for them to be different. In the special case of two types of events, none of these choices arise, because there is only one linear model. And that’s why the binomial GLM is so much easier.

There are two basic cases:

  1. predictors have different values for different values of the outcome, and

  2. parameters are distinct for each value of the outcome.

The first case is useful when each type of event has its own quantitative traits, and you want to estimate the association between those traits and the probability each type of event appears in the data.

The second case is useful when you are interested instead in features of some entity that produces each event, whatever type it turns out to be.

example: predictors matched to outcomes

You are modeling career choice for young adults. One predictor of choice is income.

# simulate career choices among 500 individuals
N <- 500 # number of individuals
income <- c(1,2,5) # expected income of each career
score <- 0.5*income # scores for each career, based on income

# next line converts scores to probabilities
p <- rethinking:::softmax(score[1],score[2],score[3])

# now simulate choice
# outcome career holds event type values, not counts
career <- rep(NA,N) # empty vector of choices for each individual
# sample chosen career for each individual

set.seed(34302)
for ( i in 1:N ) career[i] <- sample( 1:3 , size=1 , prob=p )
# put them in a tibble
d <-
  tibble(career = career) %>% 
  mutate(career_income = ifelse(career == 3, 5, career))

# plot 
d %>%
  ggplot(aes(x = career)) +
  geom_bar(linewidth = 0)

We have to choose a reference category. For this example, we’ll choose career 3. Let’s fit an intercept-only model to start. The next question is: what are our priors?

get_prior(data = d, 
          family = categorical(link = logit, refcat = 3),
          career ~ 1)
                prior     class coef group resp dpar nlpar lb ub  source
 student_t(3, 0, 2.5) Intercept                  mu1             default
 student_t(3, 0, 2.5) Intercept                  mu2             default
m3 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      career ~ 1,
      prior = c(prior(normal(0, 1), class = Intercept, dpar = mu1),
                prior(normal(0, 1), class = Intercept, dpar = mu2)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      file = here("files/models/61.3"))
posterior_summary(m3) %>% round(2)
                Estimate Est.Error    Q2.5   Q97.5
b_mu1_Intercept    -2.01      0.16   -2.33   -1.71
b_mu2_Intercept    -1.53      0.12   -1.77   -1.29
Intercept_mu1      -2.01      0.16   -2.33   -1.71
Intercept_mu2      -1.53      0.12   -1.77   -1.29
lprior             -5.04      0.39   -5.83   -4.33
lp__             -373.67      1.04 -376.56 -372.65

Turns out, these are related to our original data generating values!

Code
tibble(income = c(1, 2, 5)) %>% 
  mutate(score = 0.5 * income) %>% 
  mutate(rescaled_score = score - 2.5)
# A tibble: 3 × 3
  income score rescaled_score
   <dbl> <dbl>          <dbl>
1      1   0.5           -2  
2      2   1             -1.5
3      5   2.5            0  
data.frame(id = 1) %>% 
  add_epred_draws(m3) %>% 
  median_qi()
# A tibble: 3 × 9
     id  .row .category .epred .lower .upper .width .point .interval
  <dbl> <int> <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
1     1     1 1         0.0993 0.0744  0.129   0.95 median qi       
2     1     1 2         0.161  0.131   0.195   0.95 median qi       
3     1     1 3         0.739  0.700   0.775   0.95 median qi       
data.frame(id = 1) %>% 
  add_predicted_draws(m3)  %>% 
  count(.prediction)
# A tibble: 3 × 4
# Groups:   id, .row [1]
     id  .row .prediction     n
  <dbl> <int> <fct>       <int>
1     1     1 1             376
2     1     1 2             662
3     1     1 3            2962

Before we move on, here’s the exact same model with non-linear syntax (this will make more sense later)

# nonlinear syntax
m3_nonlinear <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1),
         nlf(mu2 ~ a2),
         a1 + a2 ~ 1),
      prior = c(prior(normal(0, 1), class = b, nlpar = a1),
                prior(normal(0, 1), class = b, nlpar = a2)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      file = here("files/models/61.3nl"))
m3_nonlinear
 Family: categorical 
  Links: mu1 = logit; mu2 = logit 
Formula: career ~ 1 
         mu1 ~ a1
         mu2 ~ a2
         a1 ~ 1
         a2 ~ 1
   Data: d (Number of observations: 500) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept    -2.01      0.16    -2.33    -1.71 1.00     3390     2673
a2_Intercept    -1.53      0.12    -1.77    -1.29 1.00     2964     2810

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Now we can add income. Remember, in this model, the predictor is matched to the outcome (that is, the value is the same for all observations with that outcome). So rather than using a variable from our dataset, we’ll just input the value directly into our model.

m4 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1 + b * 1),
         nlf(mu2 ~ a2 + b * 2),
         a1 + a2 + b ~ 1),
      prior = c(prior(normal(0, 1),   class = b, nlpar = a1),
                prior(normal(0, 1),   class = b, nlpar = a2),
                prior(normal(0, 0.5), class = b, nlpar = b,  lb = 0)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      control = list(adapt_delta = .99),
      file = here("files/models/61.4"))
m4
 Family: categorical 
  Links: mu1 = logit; mu2 = logit 
Formula: career ~ 1 
         mu1 ~ a1 + b * 1
         mu2 ~ a2 + b * 2
         a1 ~ 1
         a2 ~ 1
         b ~ 1
   Data: d (Number of observations: 500) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept    -2.15      0.19    -2.59    -1.81 1.00      777     1018
a2_Intercept    -1.81      0.28    -2.52    -1.41 1.01      750      706
b_Intercept      0.15      0.13     0.01     0.49 1.01      605      760

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

To interpret this coefficient, we can use counterfactual simulation. The question is, “How much more likely would someone be to choose Career 2 over Career 1 if the salary of Career 2 were twice what it is now?”

post = as_draws_df(m4) 
s1      = post$b_a1_Intercept + post$b_b_Intercept * 1
s2_orig = post$b_a2_Intercept + post$b_b_Intercept * 2
s2_new  = post$b_a2_Intercept + post$b_b_Intercept * 2 * 2

# use the softmax link function to create probs
p_orig = sapply(1:nrow(post), 
                function(i) rethinking:::softmax( s1[i], s2_orig[i], 0 ))
p_new = sapply(1:nrow(post), 
                function(i) rethinking:::softmax( s1[i], s2_new[i],  0 ))

p_diff = p_new[2, ] - p_orig[2, ]
rethinking::precis(p_diff)
             mean         sd        5.5%     94.5% histogram
p_diff 0.04624929 0.04570784 0.003072838 0.1335661   ▇▂▁▁▁▁▁

example: predictors matched to observations

Suppose you are still modeling career choice. But now you want to estimate the association between each person’s family income and which career they choose. So the predictor variable must have the same value in each linear model, for each row in the data. But now there is a unique parameter multiplying it in each linear model. This provides an estimate of the impact of family income on choice, for each type of career.

n <- 500
set.seed(11)

# simulate family incomes for each individual
family_income <- runif(n)

# assign a unique coefficient for each type of event
b      <- c(-2, 0, 2)
career <- rep(NA, n)  # empty vector of choices for each individual
for (i in 1:n) {
    score     <- 0.5 * (1:3) + b * family_income[i]
    p         <- rethinking:::softmax(score[1], score[2], score[3])
    career[i] <- sample(1:3, size = 1, prob = p)
}

d <-
  tibble(career = career) %>% 
  mutate(family_income = family_income)

\[\begin{align*} s_1 &= 0.5 + -2\times\text{family_income}_i \\ s_2 &= 1.0 + 0\times\text{family_income}_i \\ s_3 &= 1.5 + 2\times\text{family_income}_i \\ \end{align*}\]

Code
p1 <-
  d %>% 
  mutate(career = as.factor(career)) %>% 
  
  ggplot(aes(x = family_income, fill = career)) +
  geom_density(linewidth = 0, alpha = 3/4) +
  theme(legend.position = "none")
  
p2 <-
  d %>% 
  mutate(career = as.factor(career)) %>%
  
  mutate(fi = santoku::chop_width(family_income, width = .1, start = 0, labels = 1:10)) %>% 
  count(fi, career) %>% 
  group_by(fi) %>% 
  mutate(proportion = n / sum(n)) %>% 
  mutate(f = as.double(fi)) %>% 
  
  ggplot(aes(x = (f - 1) / 9, y = proportion, fill = career)) +
  geom_area() +
  xlab("family_income, descritized")

p1 + p2

To model, we only need to adapt our code for model 4:

m5 <-
  brm(data = d, 
      family = categorical(link = logit, refcat = 3),
      bf(career ~ 1,
         nlf(mu1 ~ a1 + b1 * family_income),
         nlf(mu2 ~ a2 + b2 * family_income),
         a1 + a2 + b1 + b2 ~ 1),
      prior = c(prior(normal(0, 1),   class = b, nlpar = a1),
                prior(normal(0, 1),   class = b, nlpar = a2),
                prior(normal(0, 0.5), class = b, nlpar = b1,  lb = 0), 
                prior(normal(0, 0.5), class = b, nlpar = b2,  lb = 0)),
      iter = 2000, warmup = 1000, cores = 4, chains = 4,
      seed = 11,
      control = list(adapt_delta = .99),
      file = here("files/models/61.5"))
m5
 Family: categorical 
  Links: mu1 = logit; mu2 = logit 
Formula: career ~ 1 
         mu1 ~ a1 + b1 * family_income
         mu2 ~ a2 + b2 * family_income
         a1 ~ 1
         a2 ~ 1
         b1 ~ 1
         b2 ~ 1
   Data: d (Number of observations: 500) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept    -2.33      0.18    -2.69    -1.99 1.00     2434     2318
a2_Intercept    -1.61      0.13    -1.88    -1.36 1.00     1916     1852
b1_Intercept     0.10      0.10     0.00     0.37 1.00     2019     1400
b2_Intercept     0.11      0.10     0.00     0.38 1.00     1394     1177

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s get those posterior predictions.

nd <- tibble(family_income = seq(from = 0, to = 1, length.out = 60))

(pred = nd %>% add_epred_draws(m5))
# A tibble: 720,000 × 7
# Groups:   family_income, .row, .category [180]
   family_income  .row .chain .iteration .draw .category .epred
           <dbl> <int>  <int>      <int> <int> <fct>      <dbl>
 1             0     1     NA         NA     1 1         0.0782
 2             0     1     NA         NA     2 1         0.0658
 3             0     1     NA         NA     3 1         0.0800
 4             0     1     NA         NA     4 1         0.0725
 5             0     1     NA         NA     5 1         0.0862
 6             0     1     NA         NA     6 1         0.0843
 7             0     1     NA         NA     7 1         0.0758
 8             0     1     NA         NA     8 1         0.0980
 9             0     1     NA         NA     9 1         0.0995
10             0     1     NA         NA    10 1         0.0564
# ℹ 719,990 more rows
pred %>% 
  mean_qi() %>% 
  ggplot( aes( x=family_income, y=.epred ) ) + 
  geom_ribbon( aes(ymin = .lower, ymax=.upper, fill = .category),
               alpha = .3) +
  geom_line( aes(color = .category) ) +
  facet_wrap(~.category, nrow=1) +
  guides(color = "none", fill= "none") + 
  labs( x="family income", y="probability of career choice" )